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Abstract 

The problem of mismatched decoding for discrete memoryless channels is addressed. A mismatched cognitive 
multiple-access channel is introduced, and an inner bound on its capacity region is derived using two alternative 
encoding methods: superposition coding and random binning. The inner bounds are derived by analyzing the average 
error probability of the code ensemble for both methods and by a tight characterization of the resulting error exponents. 
Random coding converse theorems are also derived. A comparison of the achievable regions shows that in the matched 
case, random binning performs as well as superposition coding, i.e., the region achievable by random binning is equal 
to the capacity region. The achievability results are further specialized to obtain a lower bound on the mismatch 
capacity of the single-user channel by investigating a cognitive multiple access channel whose achievable sum-rate 
serves as a lower bound on the single-user channel's capacity. In certain cases, for given auxiliary random variables 
this bound strictly improves on the achievable rate derived by Lapidoth. 
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I. Introduction 

The mismatch capacity is the highest achievable rate using a given decoding rule. Ideally, the decoder uses the 
maximum-likelihood rule which minimizes the average probability of error, or other asymptotically optimal decoders 
such as the joint typicality decoder, or the Maximum Mutual Information (MMI) decoder The mismatch capacity 
reflects a practical situation in which due to inaccurate knowledge of the channel, or other practical limitations, 
the receiver is constrained to use a possibly suboptimal decoder. This paper focuses on mismatched decoders that 
are defined by a mapping q, which for convenience will be referred to as "metric" from the product of the channel 
input and output alphabets to the reals. The decoding rule maximizes, among all the codewords, the accumulated 
sum of metrics between the channel output sequence and the codeword. 

Mismatched decoding has been studied extensively for discrete memoryless channels (DMCs). A random coding 
lower bound on the mismatched capacity was derived by Csiszar and Korner and by Hui Q, 0. Csiszar and 
Narayan J4) showed that the random coding bound is not tight. They established this result by proving that the 
random coding bound for the product channel Py 1 ,y 2 \x 1 ,x 2 = ^VilXi x -FV2IX2 ( two consecutive channel uses of 
channel Py\x) mav result in higher achievable rates. Nevertheless, it was shown in (4) that the positivity of the 
random coding lower-bound is a necessary condition for a positive mismatched capacity. A converse theorem for 
the mismatched binary-input DMC was proved in 0, but in general, the problem of determining the mismatch 
capacity of the DMC remains open. 

Lapidoth introduced the mismatched multiple access channel (MAC) and derived an inner bound on its capacity 
region. The study of the MAC case led to an improved lower bound on the mismatch capacity of the single-user 
DMC by considering the maximal sum-rate of an appropriately chosen mismatched MAC whose codebook is 
obtained by expurgating codewords from the product of codebooks of the two users. 

In (3, an error exponent for random coding with fixed composition codes and mismatched decoding was 
established using a graph decomposition theorem. In a recent work, Scarlett and Guillen i Fabregas Q characterized 
the achievable error exponents obtained by a constant-composition random coding scheme for the MAC. For other 
related works and extensions see ll8l- lfT4l and references therein. 

This paper introduces the cognitive mismatched two-user multiple access channel. The matched counterpart of 
this channel is in fact a special case of the MAC with a common message studied by Slepian and Wolf |15 1. Encoder 
1 shares the message index it wishes to transmit with Encoder 2 (the cognitive encoder), and the latter transmits an 
additional message to the same receiver. Two achievability schemes for this channel with a mismatched decoder are 
presented. The first scheme is based on superposition coding and the second uses random binning. The achievable 
regions are compared, and an example is shown in which the achievable region obtained by random binning is 
strictly larger than the rate-region achieved by superposition coding. In general it seems that neither achievable 
region dominates the other, and conditions are shown under which random binning is guaranteed to perform at 
least as well as supposition coding in terms of achievable rates and vice versa. As a special case it is shown that 
in the matched case, where it is well known that superposition coding is capacity-achieving, binning also achieves 
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the capacity region. The resulting region of the cognitive mismatched MAC achievable by binning in fact contains 
the mismatched non-cognitive MAC achievable region studied by Lapidoth [6]. Although this is not surprising, in 
certain cases, for fixed auxiliary random variables cardinalities, it serves to derive an improved achievable rate for 
the mismatched single-user channel. 

The outline of this paper is as follows. Section [TT] presents notation conventions. Section [III] provides some 
necessary background in more detail. Section [TV] introduces the mismatched cognitive MAC and presents the 
achievable regions. Section [V] is devoted to discussing the results pertaining to the mismatched cognitive MAC. 
The following section [VI] presents a lower bound on the capacity of the single-user mismatched DMC. Section [yTTl 
develops the concluding remarks. Finally, the proofs of the main results appear in Appendices I AUG 

II. Notation and Definitions 

Throughout this paper, scalar random variables are denoted by capital letters, their sample values are denoted 
by the respective lower case letters, and their alphabets are denoted by their respective calligraphic letters, e.g. X, 
x, and X, respectively. A similar convention applies to random vectors of dimension n and their sample values, 
which are denoted with the same symbols in the boldface font, e.g., x = (xi, ...x n ). The set of all n-vectors with 
components taking values in a certain finite alphabet are denoted by the same alphabet superscripted by n, e.g., 
X n . 

Information theoretic quantities such as entropy, conditional entropy, and mutual information are denoted 
following the usual conventions in the information theory literature, e.g., H(X), H(X\Y), I(X;Y) and so on. To 
emphasize the dependence of the quantity on a certain underlying probability distribution, say /i, it is subscripted 
by \x, i.e., with notations such as H^(X), H^(X\Y), I^{X\ Y), etc. The divergence (or Kullback -Liebler distance) 
between two probability measures /i and p is denoted by D(p\\p), and when there is a need to make a distinction 
between P and Q as joint distributions of (X, Y) as opposed to the corresponding marginal distributions of, say, 
X, subscripts are used to avoid ambiguity, that is, the notations D(Qxy\\Pxy) and D(Qx ||Px)-The expectation 
operator is denoted by E{-}, and once again, to make the dependence on the underlying distribution ll clear, it 
is denoted by E^{-}. The cardinality of a finite set A is denoted by \A\, The indicator function of an event £ is 
denoted by 1{£ }. 

Let V(X) denote the set of all probability measures on X. For a given sequence y 6 y n , y being a finite alphabet, 
Py denotes the empirical distribution on y extracted from y, in other words, Py is the vector {Py(y),y € y}, 
where Py(y) is the relative frequency of the letter y in the vector y. The type-class of x is the set of x' e X n 
such that Px' = Px, which is denoted T(Px). The conditional type-class of y given x is the set of y's such that 
Ac y = Ar.y = Qx,y, which is denoted T(Qx,y\x) with a little abuse of notation. The set of empirical measures 
of order n on alphabet X is denoted V n (X). 

For two sequences of positive numbers, {a n } and {&„}, the notation a n = b n means that {a n } and {&„} are of 
the same exponential order, i.e., i In |^ -> as n — > oo. Similarly, a n < b n means that limsup ra ^ In ^ < 0, and 
so on. Another notation is that for a real number x, \x\ + = max{0,a;}. 
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Throughout this paper logarithms are taken to base 2. 

III. Preliminaries 

Consider a DMC with a finite input alphabet X and finite output alphabet y, which is governed by the conditional 
p.m.f. Py\x- As the channel is fed by an input vector x e X n , it generates an output vector y G y n according to 
the sequence of conditional probability distributions 

P(y i \x 1 ,...,x i ,yi,...,y i -i) = Py\ x (yi\xi), i = l,2,...,n (1) 

where for i = 1, (j/i, is understood as the null string. A rate-i? block-code of length n consists of 2 nR 

n-vectors x(m), m = 1,2, ...,2 nR , which represent 2 nR different messages, i.e., it is defined by the encoding 
function 

/„ : {l,...,2 nR }^ X n . (2) 

It is assumed that all possible messages are a-priori equiprobable, i.e., P(m) = 2~ nR for all to, and denote the 
random message by W. 

A mismatched decoder for the channel is defined by a mapping 

q n : X n xf^t, (3) 

where the decoder declares that message i was transmitted iff 

q n (x(i),y) > q n (x(j),y),Vj / i, (4) 

and if no such i exists, an error is declared. The results in this paper refer to the case of additive decoding functions, 
i.e., 

1 ™ 

q n (x n ,y n ) = -S2q{x t ,y t ), (5) 

i=l 

where q is a mapping from X x y to ML 

A rate R is said to be achievable for the channel Py\x with a decoding metric q if there exists a sequence of 
codebooks C n , n > 1 of rate R such that the average probability of error incurred by the decoder q n applied to 
the codebook C n and the channel output vanishes as n tends to infinity. The capacity of the channel with decoding 
metric q is the supremum of all achievable rates. 

The notion of mismatched decoding can be extended to a MAC Py\Xi,X3 w i m codebooks C„ i = {xi(i)},i = 
1, 2 nRl , C„ 2 = {X2U)}, j = 1, 2 nR2 . A mismatched decoder for a MAC is defined by the mapping 

q n : X? x X? x y" -> R, (6) 

where similar to the single-user's case, the decoder outputs the messages iff for all ^ 

q n (x 1 (i),x 2 (j),y) > q n (x 1 (i),x 2 (j'),y). (7) 
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The focus here is on additive decoding functions, i.e., 

1 " 

q n (x^,x 2 \y n ) = -y2q(xi,i > x 2 ,i,yi), (8) 

i=l 

where q is a mapping from Xi x X 2 x y to M. The achievable rate-region of the MAC iV|Jfi,Jfa with decoding 
metric g is the closure of the set of rate-pairs (Ri,R 2 ) for which there exists a sequence of codebooks C n> i,C nt2 , 
n > 1 of rates R\ and R 2 , respectively, such that the average probability of error that is incurred by the decoder 
q n when applied to the codebooks C nt x,C n)2 and the channel output vanishes as n tends to infinity. 

Before describing the results pertaining to the cognitive MAC, we state the best known inner bound on the 
capacity region of the mismatched (non-cognitive) MAC which was introduced in [6|. The inner bound is given by 
TZlm where 

■R LM ^closure of the CH of U \ (R x , R 2 ) ■ 

R 1 <R 1 = min I f (X i; Y\X 2 ) + I f (X i; X 2 ) 

R 2 <R 2 = min I f (X 2 ;Y\X 1 ) + I f {X i; X 2 ) 
/ef(2) 

J R 1 +i? 2 <i? = mm I f (X U X 2 ;Y)+ I f (X i; X 2 )\, (9) 

/6f(0) J 

where CH stands for "convex hull", 

^ (1) = {fXi,X 2 ,Y ■ /Xi = Pxn 

fx a ,Y = Px 2 ,Y,E f {q) > E P (q)} 

(2) = {fx 1 ,X 2 ,Y ■ fx 2 = Px 2 , 

fx u Y = Px u Y,E f {q) > E P (q)} 

(0) = {fXi,X 2 ,Y ■ /Xi = Pxn 

fx 2 = Px 2 Jy = PY,E f (q) > E P (q), 

If(Xi;Y) < R u I f (X 2] Y) < R 2 }. (10) 
and where P Xl ,x 2 .Y = Px 1 x P X2 x P Y \x t ,x 2 - 

IV. The Mismatched Cognitive MAC 

The two-user discrete memory less cognitive MAC is defined by the input alphabets X\, X 2 , output alphabet y 
and conditional transition probability Py\x x ,x 2 - A block-code of length n for the channel is defined by the two 
encoding mappings 

/i,„ : {l,...,2 nR i}^X? 

f 2>n : {l,...,2 nR i}x {!,..., 2 nR *}^X£, (11) 
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resulting in two codebooks {xi(i)},i = 1, .... 1 2 nRl and {x2(i,j)},i — 1, ....,2 nRl ,j = 1, ....,2 nIi2 . A mismatched 
decoder for the cognitive MAC is defined by a mapping of the form $Q where the decoder outputs the message 
iff for all (i',f)^(i,j) 

q n (xi(i),x 2 (i,j),y) > q n (xi(i'),X2(i',j'),y). (12) 

The capacity region of the cognitive mismatched MAC is defined similarly to that of the mismatched MAC. 

Denote by W\,W2 the random messages, and the corresponding outputs of the decoder Wi t W2. It is said 
that E > is an achievable error exponent for the MAC if there exists a sequence of codebooks C n! i,C n> 2, 
n > 1 of rates i?i and i?2, respectively, such that the average probability of error, P e n = Pr{(Wi, W2) 7^ 
(W\, W 2 )}, that is incurred by the decoder q n when applied to codebooks C n .i,C ny 2 and the channel output satisfies 
liminfn^oo \ogP e , n > E. 

Two achievability schemes tailored for the mismatched cognitive MAC are presented next. The first encoding 
scheme is based on constant composition superposition coding, and the second on constant composition random 
binning. 

Codebook Generation of User 1: The codebook of the non-cognitive user is drawn the same way in both coding 
methods. Fix a distribution Px 1 ,x 2 G V n {X\ x X2). The codebook of user 1 is composed of 2 nRl codewords 
{xi(i)}, i = 1, 2 nRl drawn independently, each uniformly over the type-class T(Px 1 )- 

Codebook Generation of User 2 and Encoding: 

• Superposition coding: For each Xi(i), user 2 draws 2 nR ' 1 codewords X2(i,j),j = l,...,2 ,li?2 conditionally 
independent given Xi(i) uniformly over the conditional type-class T(Px 1 .x 2 \ x i(i))- To transmit message mi, 
encoder 1 transmits Xi(mi). To transmit message 7712, encoder 2, which is cognizant of the first user's message 
mi, transmits a:2(mi,m,2). 

• Random binning: User 2 draws 2™^ 2+7 ' codewords independently, each uniformly over T(Px 2 ) and partitions 
them into 2 nR2 bins, i.e., {x2[k,j}}, k = 1, 2" 7 , j = 1, 2 nR2 . The quantity 7 is given by 

1 = I P {X 1 ;X 2 )+e (13) 

for an arbitrarily small e > 0. To transmit message mi, encoder 1 transmits Ki(mi). To transmit message 
m 2 , encoder 2, which is cognizant of the first user's message mi, looks for a codeword in the m 2 -th bin, 
x 2 [k, 7712] such that (a3i(mi), X2[k, 7712]) € T(Px 1 ,x 2 )- If more than one such k exists, the encoder chooses 
one of them arbitrarily, otherwise an error is declared. Thus, the encoding of user 2 defines a mapping from the 
pairs of messages (nii,m 2 ) to a transmitted codeword x 2 , which is denoted by x 2 (mi,m 2 ), in parentheses, 
as opposed to the square brackets of x 2 [k 1 m 2 ]- 
Decoding: The decoder chooses such that q(xi(i), X2(i,j), y) is maximal according to (Tl2l , where ties are 
regarded as errors. 

The resulting achievable error-exponents for the mismatched cognitive MAC using superposition coding are 



presented next. Let 

P»J =Pr Iw 1 = W 1 ,W 2 ^W 2 \ (14) 

when superposition coding is employed. 

Let Q G V(X\ x X 2 x y) be given. Define the following sets of p.m.f.'s that will be useful in what follows: 

£(Q) = {/ G V{Xt xX 2 xy): f XuX2 = Q Xl ,x 2 } 

Q q {Q) = {/ G JC(Q) : E f {q(X u X 2 ,Y)} > E Q {q{X 1 , X 2 ,Y)}} 

Ci(Q) = {f eg q {Q): fx 2 ,Y = Qx 2 .y} 

C 2 (Q) = {/ G a,(Q) : /x 1 ,k = Qx 1 ,k} 

= {/ G a,(Q) : /y = Qr}- (15) 



Theorem 1. Lef P = Px 1 ,x 2 Py\x 1 ,x 2 , ^ en 



psup ^_n—nE 2 (P,R 2 



psup ^ -nE 1 (P.R 1 .R 2 ) 



(16) 
(17) 



where 



mm 



P 2 (P,P 2 ) 

D(P'\\P) 

Pi(P,Pi,P 2 ) 

: min D(P'\\P) 

P'G/C(P) 



min \I p (X 2 \Y\X{) - R 2 \ + 



Pec 2 (P' 



min \lp(Xi;Y) + \Ip(X 2 ; Y\Xi) - P 2 | + - Pi 

PGCa(P') 



The proof of Theorem [T] can be found in Appendix [A] We note that Theorem Q] implies that 

E sup (P, P x , P 2 ) = min {P 2 (P, R 2 ), Ei (P, Pi , R 2 )} 

is the error exponent induced by the superposition coding scheme. 
Define the following functions 

R[(P)= min I P (X V ,Y,X 2 ) 
Ped(P) 

P 2 (P) = min J # (Jf a ;y|Xi) 

Pe£ 2 (-P) 



R'I(P,R 2 )± min /p(X i; F) + |/p(X 2 ; y|Xi) 

PEC (P) 



R 2 



R' 2 \P,Ri)= min <{/p(X 2 ;y)-/p(X 2 ;X 1 ) + |/p(X 1 ;r,X 2 ; 



Pi I 



(18) 



(19) 



(20) 
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Note that for E 2 (P, R 2 ), Ei(P, Ri, R 2 ) to be zero we must have P' = P. Theorem Q] therefore implies that the 
following region is achievable: 

K7 g (P) = \(Ri,R2): - A }■ (21) 

R l <R'{{P,R 2 ) 



Consider the following regior 

K^(P) = 

Ri+R 2 < mmp eC ^ P(p) I(X l7 X 2 ;Y) J 
where 

Co UP (P) ={p G Coif) ■■ h{Xi;Y) < (23) 
The following theorem provides a random coding converse using superposition coding, and also implies the 
equivalence of K S <%${P) and 7^(P). 

Theorem 2. (Random Coding Converse for Superposition Coding) If (R\,R 2 ) £ 1Zcog(Px 1 ,x 2 ,Y) then the 
average probability of error, averaged over the ensemble of random codebooks drawn according to Px t ,x 2 using 
superposition coding, approaches one as the blocklength tends to infinity. 

The proof of Theorem [2] appears in Appendix ICl It follows similarly to the proof of Theorem 3 of ||6l 

Corollary 1. 

n™v{P)=K™l{P). (24) 

The inclusion TZ S ^(P) C K a c %$(P), follows from Theorem |2] and since TZ^^(P) is an achievable region. The 
proof of the opposite direction 7lf%J}(P) C 1Zf%J}(P) appears in Appendix iDl 

Since by definition the capacity region is a closed convex set, this yields the following achievability theorem. 

Theorem 3. The capacity region of the finite alphabet cognitive MAC Py\Xi.x 2 w ' m decoding metric q{x\,x 2l y) 
contains the set of rate-pairs 

7C£ = closure of CH of U £™£ (P) (25) 
where the union is over all P G V{X\ x X 2 x y) with conditional Py\Xi,X 2 gi ven by the channel. 

The error exponents achievable by random binning are presented next. Let P e b 4 2 ™ , P\ \" be defined as follows 



r e,2 



,-Pt\Wi=Wi,W2^W 2 ) (26) 



'Note that for convenience, in the first inequality of \22\ miripg^ Ip(^2', Y\Xi) is written explicitly instead of the abbreviated notation 
R' 2 (P). 
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when random binning is employed. 



Theorem 4. Let P = Px 1 ,x 2 Py\x 1 ,x 2 > tnen 

pbin _; 
e,2 ~ 

pbin ^ -nmin{E (P,R 1 ,R 2 ),E 1 (P,R 1 )} 
r e.\ — A 



pbin j_n—nE 2 {P,R 2 ) 
r e,2 ~ A 



(27) 



where 



Pi(P,Pi) 



mm 

P'eic(P) 



D(P'\\P)+ min \I p (X 1 ;Y,X 2 )-R 1 \ 
Pe£i(P') 



£ (P, R U R 2 ) = max{Pi(P, ^ , P 2 ), E 0>b (P, R u R 2 )} 



(28) 



one/ where 



E 0tb (P,R 1 ,R 2 ) 



mm 

P'GIC(P) 



D(P'\\P)+ min 



7p(X 2 ;y)-7 P (X i; X 2 



+ |/p(x i; r,x 2 )-i? 1 |+-i? 2 

vw'f/z E 2 {P, R 2 ) and £?i(P, Pi, P 2 ) defined in (Ei. 



(29) 



The proof of Theorem [4] appears in Appendix |B| The derivation of the exponent associated with P"* ™ makes use 
of IT] Lemma 3], where achievable error exponents for the non-cognitive MAC obtained by a constant-composition 
random coding are characterized. We note that Theorem implies that 



E hm (P, R U R 2 )= min {E 2 (P, R 2 ),E (P,R U R 2 ),E 1 (P, Pi ) } 



(30) 



is the error exponent induced by the random binning scheme. Theorem [4] also implies that for fixed P = 
Px 1 ,x 2 Py\x 1 ,x 2 > m e following rate-region is achievable: 

Pi <Pi(P), 

(Pi,P 2 ): P 2 <P 2 (P), \, 
Pi < P'/(P,P 2 ) or P 2 < P 2 '(P,Pi) 



where Pi(P), P 2 (P), R"(P, P 2 ), P 2 (P, Pi) are defined in (|20j. Next it is proven that 1l b ™ g (P) has the following 
alternative expression. Consider the rate-region: 



Pi < miiip 6£i(p) Ip(Xi;Y, X 2 ), 
(Pi,P 2 ) : R 2 <minp eL2{P) Ip(X 2 \Y\Xi), 

Pi + P 2 < minp e£ 6«„( P ) 7p(Xi,X 2 ;F) 



(32) 
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where 



Ip{X 1] Y) <R 1 ,Ip(X 2 ;Y)-I P (X 2 ;X 1 ) < R 2 



(33) 



The following theorem provides a random coding converse for random binning and it also implies that W£™ g {P) is 
an alternative expression for lZ b ™ g (P). 

Theorem 5. (Random Coding Converse for Random Binning) If (Ri 7 R 2 ) ^ T^-cog(Pxi,x 2 ,Y) then the average 
probability of error, averaged over the ensemble of random codebooks drawn according to Px t ,x 2 using binning, 
approaches one as the blocklength tends to infinity. 

The proof of Theorem [5] adheres closely to that of Theorem [2] and is thus omitted. 

Corollary 2. 



The inclusion 1Z%? g (P) C K%%(P), follows from Theorem and since T^££g{P) is an achievable region. The 
proof of the opposite direction K^JP) C ll b ™JP) appears in Appendix [E] 



Proof: The lemma follows since the cognitive encoder can assign some of the information it transmits to the 
non-cognitive user. The message W\ can be split to W\, a and W\f, corresponding to rates Pi jQ and R 2 .b- User 1 
transmits W\. a and user 2 transmits (W 2 , W\ b). The achievable rate-region becomes 



nt n g {P) = n b co n g {P)- 



(34) 



Note that in fact R b ™ g {P) can be potentially enlarged as follows: 



Lemma 1. Let (Pi,P 2 ) e 



(P) then (R\ + R 2 , 0) is also achievable by random binning. 




(Ri,R 2 ) : 3Rim, Pi, 6 > : Pi — Pi, a + Ri,b, 



Rl,a < R[(P), 



Ri,b + Ra < R' 2 (P), 



Pi, Q < R'({P,Ri, b + R 2 ) ox R l>b + R 2 <R' 2 '{P,R l>a )\. (35) 



The resulting region of rates achievable by random binning is described in the following theorem. 



Theorem 6. The capacity region of the finite alphabet cognitive MAC Py\x 1 ,x 2 with decoding metric q(xi,x 2 ,y) 
contains the set of rate-pairs 



= closure of CH of U K^'* (P) (36) 



where the union is over all P G V{X\ x X 2 x y) with conditional Py\Xi,X 2 given by the channel. 
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V. Discussion - Mismatched Cognitive MAC 
A few comments are in order: 

> Next a Lemma is proved, which establishes the fact that the achievable region of the cognitive mismatched 
MAC, TZ b ™ g , contains the achievable region of the mismatched MAC, IZlm ©. 
Lemma 2. Klm C 1Z% 
Proof: 

For this proof we use the expression (|32V Recall the definition of R 2 in (0 it satisfies 

R 2 = min I f (X 2 ;Y\X 1 ) + I f {X 1 ;X 2 ) 

= min D{f Xu x^Y\\Px 2 Px 1 ,Y) 

< min D(f Xl ,x 2 .Y\\Px 2 Px 1 ,Y) 

feV (2 y. f Xl ,x 2 =P Xl Px 2 

min I f (X 2 ;Y\X 1 ) 

feC 2 (P Xl Px 2 PY\X l ,X 2 ) 

=R 2 (Px,Px 2 Py\x u x 2 ). (37) 
where R' 2 (P) is defined in d20t . Similarly, 

Ro = min I f (X 1 ,X 2 ;Y)+I f (X 1 ;X 2 ) 

= min D(f Xu x 2 ,Y\\Px 2 Px 1 PY) 

/S27(o) 

/e' E> (o) : fx 1 ,x 2 —Px 1 Px 2 

< min //(X^ajY), 

(38) 

where the inequality follows since £q (Pxi-P-Xi-PriXi,-^) — ^(o)- ^ similar inequality can be derived for 
i?i and is omitted. 

The definition of TZ^ g d36t includes a union over all Px lt x 2 including product p.m.f.'s of the form Pxi,X 2 = 
Px 1 Px 2 , whereas the definition of TZlm ® includes a union over product p.m.f.'s alone, and thus TZlm Q 

,x -cog- m 

The fact that IZlm Q TZ h ™ g i s not surprising as one expects that an achievable region of a mismatched cognitive 
MAC should be larger than that of a mismatched (non-cognitive) MAC. 
• Next it is proved that in the matched case TZ S C ^ = TZ h ™ g and the regions are both equal to the matched capacity 
region. 

Proposition 1. In the matched case where 

q(xt,x 2 ,y) =logp(y\x 1 ,x 2 ), (39) 
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R 2 <I P (X 2 ;Y\X 1 ),R 1 + R 2 <Ip(X u X 2 ;Y)}, (40) 

and P abbreviates Px 1 ,x 2 Py\x 1 ,x 2 - 

Proof: The proof that TZ 6 C ^ = TZ" l ^ tch follows very similarly to the proof of J6] Proposition 1], and is 
thus omitted. To prove 7^™ = 7£™g tc ' i , note that as in J6j Proposition 1], for every P £ Ci(P), we have 

IpiX^Y, X 2 ) = I p (X x ;Y\X 2 ) + I P (X X ;X 2 ) 

= H P (Y\X 2 ) - Hp{Y\X u X 2 ) + Ip(X i; X 2 ) 

> Hp(Y\X 2 )-E p log(P Y[XuX2 )+Ip(Xv,X 2 ) 
(*>) 

> H P (Y\X 2 ) - E P log(P YlXl ,x 2 ) + Ip(X i; X 2 ) 

= I P (X i; Y,X 2 ), (41) 

where (a) follows from the non-negativity of the divergence and (b) follows since P £ C 2 (P) and thus 
-Eplog(g) < Eplog(q). Similarly, one can show that for every P £ C 2 (P) 

Ip(X 2 ;Y\X 1 )>I P (X 2 ;Y,X 1 ), (42) 

and that for all P £ £ (P) 

Ip(X u X 2 ;Y)>Ip(X u X 2 ;Y). (43) 
This yields that the union of rate-pairs (Ri,R 2 ) achievable by binning contains the rate-pairs satisfying 

Ri <l P {X r ,Y,X 2 ) 
i? 2 <Ip{XxY\X x ) 
R 1 + R 2 <I P (X U X 2 ;Y). (44) 

But, since Ip(X\; Y, X 2 ) > Ip(Xi; Y), the sum of the bounds on the individual rates is looser than the sum- 
rate bound, are we get the achievable vertex point (Ri, R 2 ) = (I P (X\\ Y), Ip(X 2 ; Y\X\)), and by enlarging 
this region according to Lemma Q] combined with time-sharing, the region 

R 2 <Ip{X2\Y\X x ) 
R 1 + R 2 <I P (X U X 2 ;Y), (45) 

is achievable using random binning and is contained in lZ b c ™'* (P) and thus also in 7^™. This implies that 
K?of ch C n^ g , and hence K™ a g tch = K 1 ™. ■ 
Theorem Q] is clearly an example for which IZlm Q R-Vog = ^cog with obvious cases in which the inclusion 
is strict. 
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Next we compare the achievable regions using superposition coding and random binning, i.e., TZ^ < [3TT > and 
H h ™g* (defined in Lemma Q]). In principle, neither region K™p(P), Kf™'* (P) dominates the other, as the 
second inequality in (|2TT > is stricter than the third inequality (|3T1 i and the first inequality in (l3TT l does not appear 
in (ED- 

It is easily verified that unless R'{(P, R' 2 {P)) > R[(P) and R%(P, R[(P)) > R 2 (P) (that is, unless the sum of 
the individual rates bounds is stricter than both sum-rate bounds), we have R S cog{P) Q 7t b cog*{P)i otherwise, 
the opposite inclusion R b ™ g '*{P) C K%$(P) may occur. 

An example is presented next for which 7?.™? C 7V££^ with strict inclusion. Consider the following parallel 
MAC which is a special case of the channel that was studied by Lapidoth J6] Section IV, Example 2]. Let the 
alphabets X\ = X 2 = y\ = 3^2 be binary {0, 1}. The output of the channel is given by 

r=(Yi,y 2 ) (46) 

where 

Yl =X x 

Y 2 =X 2 © Z, (47) 
© denotes modulo-2 addition, and Z ~ Bernoullifj/'), with the decoding metric 

q{xi,x 2 ,(y!,y 2 )) = - -{xx ®yi+x 2 ffiy 2 )- (48) 

Now, the capacity region of this channel with non-cognitive users was established by Lapidoth and is given 
by the rectangle 

f R 2 < l-h 2 (p") } 

Rlm = { (Ri,R 2 ) : > (49) 

I R ^ 1 J 

where h 2 (p") = -p"log(p") - (1 - p") log(l - p"). 

From Lemma |2] we know that IZlm C TZ^" l g . Consequently, from Lemma Q] we obtain that lZ b c " l g contains the 
region 

f R 2 <1~ h 2 (p") } 

K b ™ ={(R 1 ,R 2 ): 2 ~ 2VP; \, (50) 

( R l +R l <2-h 2 {p") J 

since this is also the capacity region of the matched cognitive MAC, it can be concluded that random binning 

combined with enlargement of R\ according to Lemma Q] achieves the capacity region. 

Next, it is demonstrated that the vertex point (i? 1; i? 2 ) = (1,1 — h 2 (p")) is not achievable by superposition 

coding when the cognitive user is user 2. Consider the sum-rate bound in TZ S C ^(P) d22l ) 

R!+R 2 < max min I(X 1 ,X 2 ;Y 1 ,Y 2 ) 

Px 1 .x 2 P££ s up (P) 

= max min I(X 1 , X 2 ;Y 1 ,Y 2 ). (51) 

fjfi.Xa PeC (P)-Ip(.X 1 ;Y)<Ri 
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Consequently, for R\ = 1, we obtain 

i? 2 < max min I(X 1 , X 2 ; Y 1 , Y 2 ) - 1 

fjfi,x a Pec (P):/p(x i; y)<i 

= max min I(X 1 ,X 2 ;Y 1 ,Y 2 ) - 1. (52) 

fjfi.Xa PeC (P) 

since clearly 7s( J X'i;Y') < 1 is always satisfied as X\ is binary. Now, the term 
maxp Xi X2 minp e£() /p^ I{X\, X 2 ; Y\, Y 2 ) is simply the maximal achievable rate by ordinary random 
coding for the single-user channel from X = (X\,X 2 ) to Y = {Y\,Y 2 ), which was characterized for this 
channel in JfjJ Section IV, Example 2], and is given by 2 (1 — h 2 (p"/2)). This yields that if i?i = 1 then R 2 
achievable by superposition coding satisfies 

R 2 <l-2h 2 (p"/2), (53) 
which is strictly lower than R 2 = 1 — h 2 (p") which is achievable by binning. 

It should be noted that although K™* C ll b ™ g in this case, if the roles of the users were reversed, i.e., user 1 
were cognitive and user 2 were non-cognitive, the vertex point (Ri,R 2 ) = (1,1 — h 2 (p")) would have been 
achievable by superposition as well (see the explanation following Theorem |7J. 
• The following theorem provides a condition for TZ S ^(P) C TZ^(P). 

Proposition 2. If R' 2 {P) > I P (X 2 ;Y) - I P {X X ;X 2 ) then K™p{P) C R%»(P). 

Proof: It is argued that if R' 2 (P) > I P (X 2 ;Y) - I P (Xi_\X 2 ), the constraint i?i < R[(P) in (EB is 
looser than Ri < R'{(P, R 2 ). To realize this, let R 2 = I P (X 2 ; Y) - I P (Xi;X 2 ) + A where A > 0. We have 

R'((P,R 2 )= min Ip(X\;Y) + \Ip(X 2 ;Y\X 1 ) — R 2 \ + 

< min I p (X 1] Y) + \Ip(X 2 ;Y\X 1 )-R 2 + A\ + 
PeCo(P) 

< min Ip(X 1 ;Y) + \I p (X 2 ;Y\X 1 )-R 2 + A\ + 

Ped(P) 

C == min Ip{X 1 -Y,X 2 ) 

PdCi(P) 

=R[(P) (54) 
where (a) follows since Ci(P) C Co(P), and (b) follows since for all P £ C\(P), 

IpiX^X,) 
=Ip {X 2 ;Y)-Ip (X 2 ;X 1 )+Ip(X 2 ;X 1 \Y) 

=I P (X 2 -Y)-I P (X 2 ;X 1 )+Ip(X 2 ;X 1 \Y) (55) 

■ 

In fact, Theorem |2] generalizes the fact that binning performs as well as superposition coding in the matched 
case, since in the matched case one has R' 2 (P) > Ip(X 2 ;Y) — I P {Xi \ X 2 ). 



15 



VI. The Mismatched Single-User Channel 

This section shows that achievable rates for the mismatched single-user DMC can be derived from the maximal 
sum-rate of an appropriately chosen mismatched cognitive MAC. 

Similar to the definitions in J6), consider the single-user mismatched DMC Py\x with input alphabet X and 
decoding metric q(x,y). Let X\ and X2 be finite alphabets and let <f> be a given mapping <f> : X\ x X 2 — > X, We 
will study the rate-region of the mismatched cognitive MAC with input alphabets X\ , X2 and output alphabet y, 
whose input-output relation is given by 

p Y\x 1 ,x 2 (y\xi,x 2 ) = P Y \x{y\ ( t>{xi,x 2 )), (56) 

where the right hand side is the probability of the output of the single-user channel to be y given that its input is 
<p(xi,x 2 ). The decoding metric q(xi,x 2 ,y) of the mismatched cognitive MAC is defined in terms of that of the 
single-user channel: 

q(xi,x 2 ,y) = q(<j>(x 1 ,x 2 ), y). (57) 

The resulting mismatched cognitive MAC will be referred to as the cognitive MAC induced by the single-user 
channel, or more specifically, induced by (Py\x,<l{x, y), X±, X 2 , <j>). Note that, in fact, X\ and X2 can be regarded 
as auxiliary random variables for the original single-user channel Py\x- 

In (6J Theorem 4], it is shown that the mismatch capacity of the single-user channel is lower-bounded by Ri + R 2 
for any pair (Ri , R2 ) that, for some mapping <f> and for some distributions Px 1 and Px 2 satisfy 

Ri < min Ij{X x ;Y\X 2 ) 

fx 1 ,x 2 —fx-i ix 2 

R 2 < min I f {X 2 \Y\X 1 ) 

fx 1 ,x 2 =fx 1 fx 2 

R! + R 2 < min I f (X u X 2 ;Y), (58) 

f X^ , X 2 f -^1 ^ X 2 

where T>^,i = 0, 1,2 are defined in (|T0j and P Xl ,x 2 .y = Px 1 Px 2 Py\4>(x 1 ,x 2 )- 

The proof of J6] Theorem 4] is based on expurgating the product codebook of the MAC (TS6b containing 2"( i?1+i?2 ' 
codewords v(i,j) = (xi(i),x 2 (j)) and only keeping the v(i,j)'s that are composed of xi(i),x 2 (j) which are 
jointly e-typical with respect to the product p.m.f. Px x Px 2 - It is shown that the expurgation causes a negligible 
loss of rate and therefore makes is possible to consider minimization over product measures in (l58"V 

It is easy to realize that in the cognitive MAC case as well, if (Ri,R 2 ) is an achievable rate-pair for the 
induced cognitive MAC, R\ + R 2 is an achievable rate for the inducing single-user channel. While the users of 
the non-cognitive induced MAC of Q exercise a limited degree of cooperation (by expurgating the appropriate 
codewords of the product codebook), the induced cognitive MAC introduced here enables a much higher degree 
of cooperation between users. There is no need to expurgate codewords for cases of either superposition coding 
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or random binning, since the codebook generation guarantees that for all (xi(i),x 2 (i,j)) lies in the desired 

joint type-class T(Px lt x 2 ). 

Let lZ cog {P) = T2%%g(P) U 1Z b ™ g '*(P). For convenience the dependence of lZ cog (P) on q is made explicit and 

is denoted by K cog {P Xl .x 2 ,Y ■, q{x 1 ,x 2 ,yj), 

Theorem 7. For all finite X%,X 2 , Pxi.x 2 S P(Xi,X 2 ) and cf> : X\ x X 2 — > X, the capacity of the single- 
user mismatched DMC Py\x with decoding metric q(x, y) is lower bounded by the sum-rate resulting from 

T^cog(Px u x 2 PY\4>(x 1 ,x 2 ),q(<P(xi,x 2 ),y)). 

While Theorem [7J may not improve the rate (!5Ft achieved in J6] Theorem 4] in optimizing over all 
(Xi, X 2 ,4>, Pxi, Px 2 ), it can certainly improve the achieved rates for given (0, Px 1 ,Px 2 ) as demonstrated in 
Section [V] and thereby may reduce the computational complexity required to find a good code. 

Next, it is demonstrated how superposition coding can be used to achieve the rate of Theorem [7J Consider the 
region of rate-pairs (Ri,R 2 ) which satisfy 

Ri<r!(P)= min IpiX^Y^) 
Ped(P) 

R2<r 2 (P,Ri)= min Ip{X 2 , Y) + \Ip(X 1 ;Y\X 2 ) — R 1 1 + . (59) 
Pec (P) 

This region is obtained by reversing the roles of the users in (l2Tb . i.e., setting user 1 as the cognitive one. 

We next show that the sum-rate resulting from 1Z b ™ g (P) OH is upper-bounded by the sum-rate resulting from 
the union of the regions d59l and ( f2Tb that are achievable by superposition coding. To verify this, note that (|3H is 
contained in the union of rate-regions 

R 2 <R'JP), } f R 1 <R\(P), 
(Ri,R2): 2K ' }u{(R 1 ,R 2 ): 1 " lV ; ' L (60 ) 

Ri<R'{(P,R 2 ) J [ Ra<B%(P,Ri) J 

and the region on the l.h.s. is equal to T^og EJ- Now, let R 2 ) be the vertex point of the region on the r.h.s. 
i.e., the point which satisfies 

Rl= min I p (X i; Y,X 2 ) 



R 2 = min 



Ip(X 2 ;Y) - Ip(X 2 ;X 1 ) + \Ip(Xi; Y, X 2 ) - R\\ 



(61) 



Pe£o(P) 

By definition of £ X (P) and £ (-P), denoting R\* = R\ - I P (X 2 ;Xi) and R* 2 * =R* 2 + I P {X 2 ; Xf) this yields 



R$*= min 7p(X i; y|X 2 ) 
Pe£i(P) 



= min 



(62) 



/p(x 2 ;y) + |/p(x 1 ;r|x 2 )-i 1 , ri + • 

PeCo(P) L 

Since clearly (i?", i?") hes in (l59l and since i?** + i? 2 * = R* + R 2 , we obtain that the sum-rate resulting 
from (|59l is equal to that of the r.h.s. of (l60l . Consequently, the sum-rate that can be achieved by the 
union of the two superposition coding schemes (with user 1 the cognitive and user 2 the non-cognitive and 
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vice versa) is an upper bound on the sum-rate achievable by binning, and in Theorem [7] one can replace 
T^cog(Px 1 ,x 2 PYU(Xi,x 2 )j <z(0( x ii x 2), y)) with the union of the regions (|59i l and (|2TT i. This yields the following 
corollary. 

Corollary 3. For all finite X\,X%, Px t ,x 2 € P{X\, X 2 ) and <f> : X\ X X 2 — > X, the capacity of the single-user 
mismatched DMC Py\x with decoding metric q(x, y) is lower bounded by the rate 

max{R 2 (P)+Rl(P,R' 2 (P)),r 1 (P)+r 2 (P,r 1 (P))}, (63) 

which is achievable by superposition coding, where P = Pxi,x 2 x PY\<f>(x 1 .x 2 )' an d the functions r\(P), r 2 (P, Ri) 
are defined in ( l59l ). 

VII. Conclusion 

In this paper, two encoding methods for cognitive multiple access channels: a superposition coding scheme and a 
random binning scheme were analyzed. Tight single-letter expressions were obtained for the resulting error exponents 
of these schemes. The achievable regions were characterized and proofs were provided for the random coding 
converse theorems. While apparently neither of the schemes dominates the other, there are certain conditions under 
which each of the schemes is not inferior (in terms of reliably transmitted rates) compared to the other scheme for 
a given random coding distribution Px x ,x 2 ■ An example was also discussed for a cognitive MAC whose achievable 
region using random binning is strictly larger than that obtained by superposition coding. The matched case was 
also studied, in which the achievable regions of both superposition coding and random binning are equal to the 
capacity region, which is often strictly larger than the achievable region of the non-cognitive matched MAC. 

In certain cases, binning is more advantageous than superposition coding in terms of memory requirements: 
superposition coding requires the cognitive user to use a separate codebook for every possible codeword of the non- 
cognitive user, i.e., a collection of 2 n ^ Rl+R ^ codewords. Binning on the other hand allows encoder 2 to decrease 
memory requirements to 2 nRl + 2 nlyR2+I p lyXl '- x ' 1 ^ codewords at the cost of increased encoding complexitj^. 

The achievability results were further specialized to obtain a lower bound on the mismatch capacity of the 
single-user channel by investigating a cognitive multiple access channel whose achievable sum-rate serves as a 
lower bound on the single-user channel's capacity. This generalizes Lapidoth's scheme J6] for a single-user channel 
that is based on the non-cognitive MAC. While the users of the non-cognitive MAC of |6] exercise a limited degree 
of cooperation (by expurgating the appropriate codewords of the product codebook), the cognitive MAC introduced 
here allows for a much higher degree of cooperation between users. Neither superposition coding nor random 
binning requires expurgating codewords, since the codebook generation guarantees that all pairs of transmitted 
codewords lie in the desired joint type-class T(Px 1 ,x 2 )- Additionally, Lapidoth's lower bound for the single-user 
channel requires optimizing over the parameters (auxiliary random variables alphabets and distributions), but when 

2 In this case, the choice of X2 (i, j) should not be arbitrary among the vectors that are jointly typical with x\ (i) in the j-th bin, but rather, 
a deterministic rule, e.g., pick the jointly typical X2[k, j] with the lowest k. 



the full optimization over the parameters is infeasible, the bound provided in this paper can be strictly larger and 
may reduce the computational complexity required to find a good code. We further show that by considering the 
two superposition schemes (with user 1 being the cognitive and user 2 the non-cognitive and with reversed roles) 
one can achieve a sum-rate that is at least as high as that of the random binning scheme. 
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Appendix 

Throughout the proofs the method of types is used. For a survey of the method the reader is referred to |fl6] 
Chapter 11.1] . In particular inequalities involving type sizes, such as if Px,y — Q then, 

c -x 2 nH Q{ x\Y) < lT(Q XtY \y)\ < 2 nH ^ x \ Y \ (64) 

where c n = (n+ l^WM- 1 ), i.e., \T{Q x ,y\v)\ = 2 nff « (x|y) . Additionally, if A is an event that can be expressed 
as a union over type-classes of X, and X is a random n vector over X n , since the number of types grows 
polynomially with n, we have 

maxPr (x G T(Pj) < Pr(X G A) < c„maxPr (x G T(P)) , (65) 

PEA V ' PEA V / 

i.e., Pr(X G A) = maxp eA Pr (^X G T(P)Y Finally, if Y is conditionally i.i.d. given a deterministic vector x 
with P(Yi = y\Xi = x) ~ Qy= v \x=x, then for P x = Px 

Q n (y e T (p X Y \ x) \x = as) = 2- nD{p ^ p x) } (66) 

where 

D(P\\Q\Px)=Y, p xr(x,y)^^^y ( 67 ) 

Recall the definitions of the sets of p.m.f.'s < TT3T >. Next, we define similar sets of empirical p.m.f.'s. Let Q G 
V n {Xi x %i x y) be given. Define the following sets of p.m.f.'s that will be useful in what follows 

/C„(Q) = {/ G V n (X 1 xX 2 xy): f Xl . X2 = Q XuX2 } 

G q ,n(Q) = if G Kn(Q) ■ E f {q(X 1 ,X 2 ,Y)} > E Q {q(X 1 ,X 2 ,Y)}} 

CiAQ) = If G G q ,n(Q) ■ fx a ,Y = Qx 2 ,y] (68) 

Ca, n (Q) = {/ G g q<n {Q) : f Xl! Y = Qx u y} (69) 

C ,n(Q) = {/ G a«,„(Q) : /y - Qr}. (70) 
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A. Proof of Theorem Q] 

Assume without loss of generality that the transmitted messages are (7711,7712) = (1,1), and let W n stand for 



P™ |x . We have 



psup sr W n (y\x u x 2 ) . 

P e a = \ \ T{p }| e(P XuX2 ,y) (71) 

where^ e(Px 1 ,x 2 ,y) is the average probability of {14^ = 1, W% 7^ 1} given {Xi, X 2 ,Y) = (xi,X2,y)}. Since 
the decoder successfully decodes W2 only if q n (xi(l), ^2(1), y) > Qn(xi(l),X2(j),y) for all j ^ 1 and since 
e(Px 1 ,x 2 ,y) can be regarded as the probability of at least one "success" in M2 — 1 Bernoulli trials, we have 



e{P Xl ,x 2 ,y) =1 - [l - a(Pxi,x 2 ,y) 

where 



A/0-1 

(72) 



ofe^y)- E |T(^ 2Pfl) |' (?3) 

which is the probability that X' 2 drawn uniformly over T(Px 1 ,x 2 \ x i) will yield a higher metric than the transmitted 
codeword, i.e., q n (xi, X' 2 ,y) > q n (xi,X2,y). From Lemma 1 in lfT7l we know that for a G [0, 1], one has 

i min{l, Ma} < 1 - [1 - a] M < min{l, Ma}. (74) 

Consequently, 

e(Px u x 2 ,y) =min |l, M 2 a(P Xl ,x 2 ,y)} 

_2~ n |-'s 1 °g <J (-Pa; 1 ,a; 2 ,y)-fl2| + ^7^ 

Now, by noting that the summation in (F73l > is over conditional type-classes of a; 2 given (x\,y), such that 

JSp {?(Xi,X a ,y)} > E P Xl x 2 y{ q ( Xl ' X2 > Y ^ and ^ £Cl - :C2 = ^1^2 as x *> x 2 ^ both drawn 

conditionally independent given Xi, and using (l64l and (lo*5T l we get 

i,x 2 ,y) 

\T(Px 2 \x u y)\ 

max 



Pec 2 , n (P XiX2 y) \T(P X2 \ Xl )\ 

— n min St- /• /£> \ (^2 ) 

=2 Pe£ 2,, 1 (p a ; 1 ,a; 2 ,y) p ^ 1 ^ ( 76 ) 
where C2, n (Px 1 ,x 2 ,y) is defined in (f69b . Using ( f64T > and d66l ) and gathering (JTTJ, (|75T l and ( T76T > this yields 

psup ^ -nmin P , eKn(P) (D(P'\\P)+mmp ec , pl) \lp(X 2 ;Y\X 1 )-R 2 | + ) 
^e,2 — - v ' 

^ 2 - nE 2(P,R 2 ) ( 77 ) 



3 We denote e(Px 1 t x 2 } y) rather than e(xi ,X2,y) because it is easily verified that the average error probability conditioned on (tci, X2,y) 
is a function of the joint empirical measure Px 1 ,x 2 ,y- 
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where the last step follows since by continuity of \Ip(X 2 ; Y\X±) — R 2 \ + in P for sufficiently large n we can replace 
the minimization over empirical p.m.f.'s C 2 . n (P') with a minimization over L 2 (P'), and in a similar manner, we 
can replace the minimization over IC n (P) with a minimization over JC(P) and obtain exponentially equivalent 
expressions. This concludes the proof of ([Tol l. 

Next, to compute the average error probability of the event of erroneously decoding W\, i.e., W\ ^ 1, we have 

W n (y\ Xl ,x 2 ) 



sup 



where 



psu 



"(P Xl 



E 



(X 1 .X 2 )GT(P XlrX2 ),y 



\T(Px u x 2 )\ 



-v{Px u x 2 ,y) 



(78) 



x 2 ,y) - E 

T(P Xl ,x 2 ,y I V) - q q " n % 2 2 %\ ^ 1 >P*i -x 2 = p *i ,x a 



Pr {32 + 1, j : (X 1 (i),X 2 (i 1 j)) 6 T(P XuX2 . y |y)} 



(79) 



Now, fix some m S {2,..., Mi} and note that v(x 1 ,x 2 ,y) can be regarded as the probability of at least one 
"success" in Mi — 1 Bernoulli trials, i.e., 



Mi-l 



Pr{a* ? 1,3 : (X 1 (i),X 2 (i,j)) e T{P XltXaiY \y)} 
=l-[Pr{^j: (Xi(m),X a (m,i))Gr(P Xl ,* a ,y|l/)}" 
The event { jBj : (Xi(m),X 2 (m, j)) 6 T(P Xl ,x 2 ,Y is the union of two disjoint events 
. A4{jr!(m) ^T(P Xl ,y|y)} 

. B 4 |xi(m) g T{P XuY \y) and : (Xi(m), X a (m, j)) e T(P Xl> x 2 ,y|2/) 
Since Jf^m) is drawn uniformly over T(P Xl ) = T(P Xl ) we have from (|64| i 

p r{A} = j _ gX^rMI ^ 1 _ 2 -nI P (X 1 ;Y) 



(80) 



|T(PxJ| 



(81) 



and 



Pr{B} 



T(P Xi ,y\v) 
\T(P Xl )\ 



j_ 9 -nJp(Xi;F) 



1 



) -nIp(X 2 ;Y\X 1 ) ll> 



M 2 -l 



(82) 



Therefore, since the events are disjoint 

Pi {A LIB} =Pr{A}+Pr{B} 

= 1 - 2~ nJ p( Xl ' r ) + 2- n/ p( Xl ' Y ) 



1 _ 2 -»-fp(^i^) 



1 _ 2-nIp{X 2 ;Y\Xi 



1 _ 2-"/p(X 2 ;Y|X 1 ) 



i\/ 2 



1 _ 2-nIp(X 1 :Y) + \Ip(X 2 ;Y\X 1 )~R 2 \ 



(83) 
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where the last step follows from ((74) applied to a = 2 nI p{ x ^\ Y \ x i) > and from ( f80b we have 

Pr{a^l,j : (X 1 {i),X 2 (i,j)) €T(P XuX2lY \y)} 

(84) 



=1 - 



1 _ 2 -"'Ip(Xi:Y)+\Ip(X 2 ;Y\X 1 )-R 2 ^ 



-n\lp(X 1 ;Y) + \Ip(X 2 :Y\X 1 )-R 2 \ + -R 1 \ 



(85) 



which follows from (ED applied to a = 2- nI r { - Xl '< Y "> + \ I r {x ^ x ^~ R ^ + . 

Now, by noting that the summation in (|79l is over conditional type-classes of (x^x^) given y, such that 
^ y {q{X u X 2 ,Y)} > Ep x ^ x y {q(X 1 ,X 2l Y)} and P^,^ = P XuX2 , and using (US we get 

v(Px 1 ,x 2 .y) 

= max 2 -n\l p (X 1 ;Y) + \Ip(X 2 ;Y\X 1 )-R 2 \ + -R 1 \ + (g6) 

where C a ^ n {P Xl ,x 2 ,y) is defined in <|70). 

Using (|66*| | and gathering ([781 . ( f86b this yields 

psup ^ 2 -nmm P , eKn(P) (D(P'\\P)+mm PeCo n(p , ) \lp(X u Y)+\Ip{X 2 ;Y\X 1 )-R 2 \+-R 1 \ + ^ 

^2- nE i{ p ,Ri,R2) (87) 

where the last step follows since by continuity of \I p {X 1 ;Y) + \Ip(X 2 ;Y\X 1 ) - R 2 \ + - Ri\ in P, for sufficiently 
large n we can replace the minimization over empirical p.m.f.'s £o.n(P') with a minimization over Cq(P'), and 
similarly, we can replace the minimization over JC n {P) with a minimization over K,(P) and obtain exponentially 
equivalent expressions. 

B. Proof of Theorem [5] 

The first observation similarly to lfj"8"l . is that the probability that for given (mi, m 2 ), no k £ {1, 2™ 7 } exists 
such that (xi(mx), x 2 [k, m 2 ]) G T(Px 1: x 2 ) vanishes super-exponentially fast provided that 7 > Ip(X\\ X 2 ) + e 
for some e > since 



Pr{fik: (X 1 (mi),X 2 [k,m 2 }) eT(P Xl ,x 2 )} 



\T{Px 2 \ Xl \ 



\T{Px 2 



= ( 1 - 2~™ J 



<cxp{-2"^-^( Xi;X2 »}. (88) 

Moreover, from the union bound over (mi,m 2 ) we have that the probability that there exists (mi,m 2 ) such that 
(aii (mi), x 2 [k, m 2 ]) £ T(Px 1: x 2 ) for all k vanishes super exponentially fast, therefore, we have 

Pr{X 2 (mi,m 2 ) = x 2 \ Xl ( mi ) = Xl } = 1{ ^ J^ 1 ,^ jf l)} , (89) 
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i.e., uniform in the appropriate conditional type-class. 

Assume without loss of generality that cc 2 (l, 1) = x 2 [1,1]- As a direct consequence of d89l l, P\ % 2 can be 
calculated similarly to its calculation for the superposition coding scheme yielding 



r>bin _i_ n — nE 2 (P.R.2 
r e,2 — z 



(90) 



The calculation of ™ on the other hand, differs from that of the superposition coding since the last step of 
is no longer valid as it does not correspond to 2 nRl — 1 independent Bernoulli experiments. Hence, 

W n (y\ Xl ,x 2 ) 



rybin 
r e,l 



E 



(x u x 2 )eT(P XltX2 ),y \ T ( Px i< x J\ 



-C(Px 1 ,x 2 ,y) 



where 



C(Px u x a ,y) — 



E 



T(P Xl , X2 , Y \y):^^>i.p Xl , X2 =p Xl , X2 

xPr{3i^l,(fc,i) : (X 1 (i),X 2 [k,j}) eT(P XuX2lY \y)} 

where i £ {2, 2 nRl }, j e {1, 2 nB ' 2 }, k e {1,...,2"^}. 

We distinguish between two cases: (k,j) ^ (1, 1) and (k,j) = (1, 1) since 

Pr{3i^l,(fc,j) : (X 1 (i),X 2 [k,j}) eT(P XuX2lY \y)} 

=Pr{^ ^ 1 : (Xi(i),X 2 [l, 1]) G T{P XuX2 \ Y \y)) 

+ Pr{3* ± 1, (fc,j) ^ (1, 1) : (Xi(i),X 2 [M) £ T(P Xl>X2 | y |y)} 

case a: (fc, j) 7^ (1,1): In this case we use Lemma 3 of Q to obtain: 

Pr {3* ? 1, ^ (1, 1) : X 2 [fc, j]) G T(P Xl ,x 2 ,y|y)} 

^_2-" max {V'i(-P,-Ri^2)^2(-P,-Ri,-R2)} 



where 



V>i(p, Pi, p 2 ) = /p(A: i; y) + |/ P (x 2 ; y, x x ) - p 2 - 7 |+ - p x 

^ 2 (P, Pi, P 2 ) 4 ^(Xa; y) + |/ P (X i; y X 2 ) - Pi|+ - P 2 - 7 



ifl 



(91) 



(92) 



(93) 



(94) 



(95) 



Since 7 = Ip(Xi, X 2 ) + e where e is arbitrarily smalQ the functions il>i(P, Pi,P 2 ) and il>i(P, Pi, P 2 ) converge 
to 

+ 



^{P,R U R 2 ) 4 / i5 (Xi;y) + |/ i ;(X 2 ;y|Xi)-P 2 | + -Pi i 
Vi 2 (P, Pi, P 2 ) 4 |/p(X 2 ;y) - / p (X 2 ;Xi) + |7 p (Xi; y, X 2 ) - Pi| + - P 2 
respectively, as n tends to infinity. 



(96) 



4 In fact, e can be replaced, for example, with e n = n 1 / 2 to guarantee that the r.h.s. of J88t vanishes super-exponentially fast. 
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Mi-1 

l 



case b: (k,j) = (1, 1): In this case we have 

Pr{^l : (X 1 (i),X 2 [l,l\) GT(P XuX2iY \y)} 

(a) 

= miri^M^-™ 7 ^* 1 ^* 2 )} 

=2 -n|/p(X i; y,X 2 )-_R 1 |+^ 

where (a) follows from d74t . Gathering the two cases we obtain 

Pr{^ ? l(j,k) : (X 1 (i),X 2 [j,k]) G T(P Xl , XaX \y)} 
= max | 2 -™Kp(^i;i'.^2)-fli| + ) 2- nm ^Wi(-P. R i.- R 2),^2(^,fti,-R2)}| _ ( 98 ) 

This yields similarly to ( |87| > 

p&in =2"" min{Bo(P,i?1 ' fl2) ' £l(P '' Rl)} (99) 

C. Proof of Theorem [2] 

The proof of the random coding converse follows the line of proof of Theorem 3 of 1151, For the sake of 
completeness, the proof outline is repeated here and the differences between the proofs are reiterated: Recall that 
PxlX 2 € V n {pC\ x X 2 ) is the random coding distribution. We need to show that if the inequality 

i?i + i? 2 < min I(X 1 ,X 2 ;Y)±V(R 1 ,R 2 ,P) (100) 

is violated the ensemble average error probability tends to one as n tends to infinity. The proof of the claim that 
the first inequality in (1221 . i.e., R 2 < miiip G £ 2 (p) Ip(X 2 ;Y\Xi) is a necessary condition for a vanishingly low 
average probability of error is simpler and thus omitted. 
We follow the steps in ]6j: 

• Step 1: if R\ + R 2 > mmp ecSUP ^ p ^ I(Xi,X 2 ; Y) then there exists a p.m.f. P G Cq UP (P) such that 

R 1 +R 2 >I P (X 1 ,X 2 ;Y) (101) 
R^IpiX^Y) (102) 
E p {q} >E P {q} (103) 

This follows directly as in J6j by the convexity of the function $(Ri, R 2 , P) defined in (1100b . 

• Step 2: By the continuity of the relative entropy functional and by Step 1 we can find some A > 0, some 
e > , and a neighborhood U of P such that for every / G U, and P(Y) G T^(P Y ) (where T^(Py) is the 
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set of strongly e-typical sequences w.r.t. P Y ), 



1 5-^2 



■Hi + R2 >D(f XuX2lY \\P Xl P X2 \P y ) + A 



Ri >D{f Xl \ Y \\Px 1 \P y ) + A 



E f {q} >E P {q} + A. 



(104) 



We next choose a sufficiently small neighborhood V of P so that for every jieVwe have 



E^q} < E P {q} + A, 



MY G T4(iV), 



(105) 



We can thus conclude that if the triple (a5i(l), x%{\, 1), y) has an empirical type in V, and if there exist 
codewords (xi(i),X2(i,j)) with i ^ 1 such that the empirical type of (;ci(z), Xa^j j), y) is in U, then a 
decoding error must occur. 

• Step 3: We will show that the probability that there is no (i ^ l,j) such that (xi(i),X2(i,j),y) is in U 
vanishes as n tends to infinity. We can thus conclude that provided that y is e-typical w.r.t. Py, the conditional 
probability that there exist some i ^ 1 and j such that the joint type of (xi(i),X2(i,j),y) is in U approaches 
one. In particular, by (1104b . with probability approaching one, there exists a pair of incorrect codewords that 
accumulate a metric higher than Ep{q] + A. 

We have shown in ( |84T i that for every T(Q Xl X _ 2 \ Y \y) sucn mat me marginal distribution of X\,X2 induced 
by Qy,x u x 2 =Py x Qx u x a \Y is Px u x 2 , 



where Mi = 2 nRl , and it is easily verified to be vanishing as n tends to infinity if R\ > Iq(X\;Y) and 
Ri+R 2 > Iq(X 1 ,X 2 ;Y). 

m Step 4: By the LLN, the probability that the joint type of X2Q, ^),Y) is in V approaches one as 

the blocklength tends to infinity. In this event, by ( 11051 ), the correct codewords accumulate a metric that is 
smaller than Ep{q] + A, and thus by ( 11041 ) an error is bound to occur if there exists such a codeword-pair. 
Since the conditional probability ( 1106b given y approaches one for any e-typical y, and since all p.m.f.s in V 
have marginals in T™(Py) it follows that the probability of error approaches one as the message length tends 
to infinity, and the theorem is proved. 



Pr{£i + l,j : (X 1 (i),X 2 (i,j)) e T(Q XliX2lY \y)} 




(106) 
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D. Proof of Corollary Q] 

It remains to prove that TZ'^(P) C 7l%£Jj(P). To realize this, fix R 2 and let R\ be the corresponding maximal 
user l's rate resulting from TZ^p(P) (fJTJi, i.e., the rate Ri which satisfies 



i?i = min Ip(X 1 ;Y) + \Ip(X 2 ;Y\X 1 )- R 2 \ 
PeCo(P) 



Now, observe that since > t, Vt, we have 
This implies that TZ^(P) is equivalent to 



min Ip(X 1 :Y) + \Ip(X 2 ;Y\X 1 )-R 2 \ + . 

PGC„(P): I p (X 1 ;Y)<R 1 



(Rl,R 2 ) 



R 2 < mm PeC2{p) Ip(X 2 ;Y\Xi), 
Ri < mmp £C s, P{P) Ip{X i; Y) + \Ip(X 2 ;Y\Xi) - R 2 



(107) 



(108) 



(109) 



which obviously contains TZ S C ^(P) since > t,Vt, and implies that H S C ^{P) C TZ S C ^(P). 

Another way to realize that TZ S C ^(P) C 11™p(P) is to consider Ei(P,Ri,R 2 ) flUl and note that whenever 
Ri > Ip{Xx-Y) we have \Ip(X\;Y) + \Ip(X 2 ;Y\X 1 ) - R 2 \ + - Ri\ + > 0, and that \Ip{X 2 ; Y\X t ) - R 2 \ + 
>Ip{X 2 ;Y\X x )-R 2 . 



E. Proof of Corollary [2] 

It remains to prove that ft^"(P) C U b c ^ g (P). It can be shown similarly to Corollary [TJ (see (1X071 -(11091). that 
the union of the following regions is equivalent to lZ b c " l g (P) (1311 : 

Ri<R[(P), 
(fli.ifc): R2<R' 2 (P), 



Ri< min Ip(X 1 ;Y) + \Ip(X 2 ;Y\X 1 )-R 2 \ + 

PG/lo(P):Ip(X 1 -Y)<R 1 



(HO) 



and 



Ri < R'i(P), 

(R l ,R 2 ): R2<R' 2 (P), 



R 2 < min Ip(X 2 :Y)-I p {X 2 ;X 1 ) + \Ip(X 1 ;Y,X 2 )-R 1 \ 

PeC (P): Ip(X2;Y)-Ip(X 2 ;X 1 )<R 2 



(111) 



Clearly, this union contains the region 

Ri < R[(P), 
Ri < R' 2 (P), 



(Ri,R 2 



Ri + R 2 < max ■ 



min IpiXuXx-Y), 

PEjCo(P)tIp(X 1 ;Y)<R 1 

min Ip{Xx,X 2 ;Y) 

PeCo(P): Ip(X2;Y)-Ip(X 2 ;X 1 )<R2 



> ■ 



(112) 



26 



Note that Ip(X\,X\] Y) is convex in Px lt x 2 \Y f° r fixed Py (and by definition of P G Cq(P), Py = Py is fixed 
and not minimized) and since the sets over which the minimizations are performed are convex the sum-rate bound 
in (II 12b is equal to sum-rate bound 

Ri+R 2 < min X x ; Y). (113) 

This yields that ^™(P) C K b ™ g (P), and concludes the proof of Corollary |2] 
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